Skip to content

test(jobs-manager): lock spawned-job RESOURCE_REQUESTS/LIMITS default at 8Gi#260

Merged
saadqbal merged 1 commit into
developfrom
fix/job-resource-defaults-745
Jun 16, 2026
Merged

test(jobs-manager): lock spawned-job RESOURCE_REQUESTS/LIMITS default at 8Gi#260
saadqbal merged 1 commit into
developfrom
fix/job-resource-defaults-745

Conversation

@saadqbal

@saadqbal saadqbal commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

What

Adds helm-unittest assertions pinning the rendered per-spawned-training-job RESOURCE_REQUESTS / RESOURCE_LIMITS env to cpu=2,memory=8Gi on both containers that receive it (api + pods-monitor), plus an operator-override case.

Why

The chart is the single effective source of truth for these values — jobs-manager-deployment.yaml always injects them with a templated "cpu=2,memory=8Gi" fallback, and client-runtime's jobs_manager.py only falls back to its own default when they're absent. Those two had silently drifted (the chart said 8Gi; the code's dead-code default was ~202Mi request / 20G limit). This guard renders the template and asserts the value so the drift can't recur unnoticed — the "render-and-assert test" called for in tracebloc/backend#745.

The two contains blocks per container also guard against one of the template's two RESOURCE_* blocks being edited without the other.

Test

helm unittest -f 'tests/jobs_manager_test.yaml' ./client   # 17 passed

Companion

The actual reconciliation (code fallback → 8Gi) lives in the runtime: tracebloc/client-runtime#111.

Refs tracebloc/backend#745

🤖 Generated with Claude Code


Note

Low Risk
Test-only change; no Helm template or runtime behavior is modified in this PR.

Overview
Adds helm-unittest coverage in jobs_manager_test.yaml so the jobs-manager chart cannot silently drift from the intended per-spawned-training-job resource env vars.

New cases render jobs-manager-deployment.yaml and assert that RESOURCE_REQUESTS and RESOURCE_LIMITS default to cpu=2,memory=8Gi on both the api (containers[0]) and pods-monitor (containers[1]) containers, each emitted once. A second test confirms operators can override those values via values.env.RESOURCE_REQUESTS / RESOURCE_LIMITS.

This is a contract test for tracebloc/backend#745: the chart is the effective source of truth for what client-runtime sees when spawning training Jobs (companion runtime change is elsewhere).

Reviewed by Cursor Bugbot for commit 431fe9d. Bugbot is set up for automated code reviews on this repo. Configure here.

… at 8Gi

Add helm-unittest assertions pinning the rendered per-spawned-job
RESOURCE_REQUESTS / RESOURCE_LIMITS env to "cpu=2,memory=8Gi" on both
containers (api + pods-monitor), plus an operator-override case. The chart
is the single effective source of truth for these values (it always injects
them), so this guards against silent drift between the chart and
client-runtime's jobs_manager.py fallback — the drift reconciled in
tracebloc/backend#745.

Refs tracebloc/backend#745

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@LukasWodka

Copy link
Copy Markdown
Contributor

👋 Heads-up — Code review queue is at 19 / 8

Above the WIP limit. The team convention is to review existing PRs before opening new work.

Open PRs currently in Code review (oldest first):

Pull from review before opening new work. (This is a nudge from the kanban WIP check, not a block.)

@saadqbal saadqbal self-assigned this Jun 16, 2026
@saadqbal saadqbal requested a review from aptracebloc June 16, 2026 09:09
@saadqbal saadqbal merged commit ac78ba8 into develop Jun 16, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants